Dimension Reduction and Discriminant Analysis for Japanese Connected Vowel Recognition

نویسندگان

  • Satoshi Asakawa
  • Nobuaki Minematsu
  • Keikichi Hirose
چکیده

The aim of speech recognition is to extract only the linguistic information from speech signals. The acoustic variations caused by non-linguistic factors, such as speaker, communication channel and noise, pose a challenging problem for speech recognition. The same text can lead to different acoustic observations due to different speakers and different environments. To deal with these variations, modern speech recognition approaches mainly make use of the statistical methods (such as GMM, HMM) to model the distributions of the acoustic features. These methods can achieve relatively high recognition rates when properly trained. However, they always require a large number of high quality data for training. This is very different from children spoken language acquisition, where the children mainly use very biased training data from mothers and fathers. This fact largely indicates that there may exist robust representations of speech which are nearly invariant to non-linguistic variations. Along this line, the third author of this paper proposed an invariant structural representation of speech signals, which tries to remove the nonlinguistic factors in speech signals [1]. Different from classical speech models, this structural representations focus on the dynamic motions in speech and discard the static features. Mathematically, the structural representations are made up of Bhattacharyya distances (BD), which are invariant to invertible transformations on feature space [2]. Our previous works have demonstrated the effectiveness and efficiency of this novel representation in both speech recognition tasks [3, 4] and computer aided language learning (CALL) systems [5]. However, there are two limitations for direct use of structural representation for speech recognition. 1) Its dimension is high, which not only increases the computational cost but also makes it easily suffer from the curse of dimensionality [3]. 2) The invariance can be too strong, such that two linguistically different speech signals may have similar structural representations [4]. In this paper, we introduce the techiniques of dimension reduction and discriminant analysis to address these two problems. As first, we build a structure for each sub-stream of the cepstrum features to overcome the too strong invariance. Then we calculate a reduced structure vector for each sub-stream and apply linear discriminant analysis for final classification. The new represen-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recognition of Connected Japanese Vowel Utterances Using Random Discriminant Structure Analysis

Automatic speech recognition has to deal with the non-linguistic variations of speech signals. Many non-linguistic variations can be modeled as the transformations of features. The universal structure of speech [12], [13], proves to be invariant to the feature transformations, and thus provides a robust representation for speech recognition. One of the difficulties of using the structure repres...

متن کامل

Facial expression recognition based on Local Binary Patterns

Classical LBP such as complexity and high dimensions of feature vectors that make it necessary to apply dimension reduction processes. In this paper, we introduce an improved LBP algorithm to solve these problems that utilizes Fast PCA algorithm for reduction of vector dimensions of extracted features. In other words, proffer method (Fast PCA+LBP) is an improved LBP algorithm that is extracted ...

متن کامل

Supervised Feature Extraction of Face Images for Improvement of Recognition Accuracy

Dimensionality reduction methods transform or select a low dimensional feature space to efficiently represent the original high dimensional feature space of data. Feature reduction techniques are an important step in many pattern recognition problems in different fields especially in analyzing of high dimensional data. Hyperspectral images are acquired by remote sensors and human face images ar...

متن کامل

Speaker-independent consonant recognition by integrating discriminant analysis and hmm

In this paper, we propose a new consonant recogmtIOn method which integrates two stochastic method: discriminant analysis and HMM (Hidden Markov Models). Discriminant Analysis is effective to analyze local patterns around the reference-point of a consonant such as a burst point. This method, however, is based on the assumption that the reference-point is detected precisely. HMM is able to extra...

متن کامل

A Multi Linear Discriminant Analysis Method Using a Subtraction Criteria

Linear dimension reduction has been used in different application such as image processing and pattern recognition. All these data folds the original data to vectors and project them to an small dimensions. But in some applications such we may face with data that are not vectors such as image data. Folding the multidimensional data to vectors causes curse of dimensionality and mixed the differe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008